←︎ Back Business Forecasting › Class Slides
1 / 20

Time Series Regression

Lecture 6

How can past and current values of other variables help us forecast?

A linear regression model expresses the forecast variable as a linear function of predictors.
The multiple linear regression model:
yt = β0 + β1x1,t + β2x2,t + … + βkxk,t + εt
yt is the forecast variable (dependent variable) at time t.
xj,t are the predictor variables (independent variables).
βj are the unknown coefficients, estimated from data.
εt is the error term — the part of yt not explained by the predictors.

How are the regression coefficients estimated?

Ordinary Least Squares minimizes the sum of squared residuals.
OLS chooses β̂ to minimize:
SSR = ∑t=1T et² = ∑t=1T (yt − β0 − β1x1,t − … − βkxk,t
The solution (in matrix form) is:
β̂ = (X′X)−1X′y
Interpretation: β̂j is the estimated change in y associated with a one-unit increase in xj, holding all other predictors constant.

The Gauss-Markov assumptions make OLS optimal.

1. Linearity — the true relationship is linear in the parameters.
  • Nonlinear relationships require transformations or nonlinear models.
2. No perfect multicollinearity — predictors are not exact linear combinations of each other.
  • Perfect multicollinearity makes (X′X) non-invertible and coefficients undefined.
3. Zero conditional mean — E[εt | X] = 0.
  • Requires exogeneity: no omitted variables correlated with X.
4. Homoskedasticity — Var(εt | X) = σ² (constant).
  • Violated when residual spread grows with the level of X or time.
5. No serial correlation — Cov(εt, εs) = 0 for t ≠ s.
  • Often violated in time series data — this is a key challenge.
Serial correlation in residuals is the biggest threat in time series regression.
If the error terms εt are correlated across time, OLS standard errors are wrong — usually too small. This leads to:
  • t-statistics that appear significant when they are not.
  • Confidence intervals that are too narrow.
  • Prediction intervals that understate true uncertainty.
Detection: plot the ACF of residuals; use the Breusch-Godfrey test (preferred over Durbin-Watson for multiple lags).
Fix: model the serial correlation explicitly by adding lagged dependent variables, ARIMA errors, or moving to a dynamic regression framework.
Heteroskedasticity means the error variance is not constant.
In time series, heteroskedasticity often appears as residuals that are larger during volatile periods (recessions, crises) than during stable ones.
Consequence: OLS is still unbiased, but standard errors are incorrect. Test statistics and prediction intervals are unreliable.
Detection: plot residuals vs. fitted values and vs. time. A “fanning out” pattern indicates heteroskedasticity.
Fixes:
  • Log-transform the dependent variable (if variance grows with level).
  • Use heteroskedasticity-consistent (HC) standard errors — also called “robust” standard errors.
  • In R: coeftest(fit, vcov = vcovHC(fit)) or TSLM() + robust SEs.

What is endogeneity, and why does it matter for forecasting?

Endogeneity occurs when a predictor is correlated with the error term.
This violates the zero conditional mean assumption and makes OLS estimates biased and inconsistent. Common sources:
  • Omitted variables — a variable correlated with both x and y is left out.
  • Reverse causalityy causes x as well as x causing y.
  • Measurement error in the predictor — the observed x differs from the true x.
In a forecasting context: endogeneity is less critical if the goal is prediction accuracy, not causal interpretation. A biased coefficient can still produce good forecasts if the correlation between x and y is stable. But it matters when you want to understand why a forecast is high or low.

Choosing useful predictors for time series regression

Trend — a time index captures linear growth or decline.
  • yt = β0 + β1t + εt
  • Extend with for quadratic trends.
Dummy variables for seasonality.
  • One dummy per season minus one (to avoid perfect multicollinearity with the intercept).
  • Coefficient on “January dummy” = average January deviation from the baseline month.
Intervention variables for structural breaks.
  • A step dummy (0 before event, 1 after) captures a permanent level shift.
  • A spike dummy (1 only at time t0) captures a one-off outlier.
Lagged predictors — when predictors affect y with a delay.
  • Advertising spend in month t−1 may predict sales in month t.
More predictors are not always better.
Adding predictors always improves in-sample fit (R² never decreases). But out-of-sample forecast accuracy can deteriorate with too many predictors — this is overfitting.
Information criteria penalize model complexity to select the right number of predictors:
  • AIC (Akaike): minimizing AIC favors models that forecast well. Asymptotically equivalent to leave-one-out cross-validation.
  • AICc: corrected AIC for small samples. Use instead of AIC when T/k < 40.
  • BIC: penalizes complexity more heavily than AIC; consistent for model selection.
In fpp3: glance(fit) |> select(AIC, AICc, BIC).
R² measures in-sample fit, not forecast accuracy.
R² = 1 − SSR/TSS is the fraction of variance explained by the model in the training data. It always increases when you add predictors, even useless ones.
Adjusted R² penalizes extra parameters: R̄² = 1 − (1−R²)(T−1)/(T−k−1). Better than R² for model comparison, but still an in-sample measure.
The right metric for forecasting is out-of-sample accuracy (MASE, RMSE on test data, or TSCV). A model with R² = 0.95 that cannot beat seasonal naïve on new data is useless for forecasting.
Nonlinear relationships can often be linearized through transformation.
Common linearizing transformations:
ModelEquationInterpretation of β1
Log-loglog y = β0 + β1 log xElasticity: 1% ↑ x ⇒ β1% ↑ y
Log-linearlog y = β0 + β1 xSemi-elasticity: 1 unit ↑ x ⇒ 100β1% ↑ y
Linear-logy = β0 + β1 log x1% ↑ x ⇒ β1/100 unit ↑ y
When forecasting from a log model, remember to back-transform with a bias correction: E[y] ≈ exp(fitted + ½σ̂²).

Producing forecasts from a regression model

Ex ante forecast — uses only information available at the forecast origin.
  • Future values of predictors must themselves be forecast (or be known in advance).
  • Example: forecasting electricity demand using forecast temperature.
Ex post forecast — uses actual future predictor values.
  • Only useful for model evaluation, not real-time forecasting.
  • Isolates the regression model’s contribution from predictor forecast errors.
Prediction intervals must account for two sources of uncertainty.
  • Uncertainty in the error term εT+h.
  • Uncertainty in the estimated coefficients β̂.
  • In practice: use fpp3’s forecast() which handles both automatically.
In fpp3, TSLM() fits time series linear models with convenient shorthand.
# Trend + seasonality
fit <- data |> model(TSLM(y ~ trend() + season()))
# With external predictor
fit <- data |> model(TSLM(y ~ x1 + x2 + trend()))
# Forecast with new predictor values
fit |> forecast(new_data = future_scenarios)
report(fit) gives coefficients, standard errors, t-statistics, and R². gg_tsresiduals(fit) checks assumptions.
Correlation is sufficient for forecasting; causation is required for policy.
A regression model can produce accurate forecasts even if the predictor-outcome relationship is not causal — as long as the correlation is stable and the predictor is available before the outcome.
Example: shoe sales in the prior month might correlate with consumer spending next month. Even if there is no causal story, the correlation is useful for forecasting — provided it persists.
Danger: spurious correlations (two series that both trend upward) produce high R² but zero out-of-sample predictive value. Always evaluate forecast accuracy on held-out data, regardless of how compelling the in-sample relationship looks.
Including the right controls prevents confounded estimates.
If a variable Z causes both X and Y and is omitted from the regression, the coefficient on X will absorb the effect of Z. The estimated β̂1 is biased.
The direction of the bias follows the formula: Bias = γ · ρ(X, Z) where γ is the effect of Z on Y and ρ(X,Z) is the correlation between the omitted variable and the predictor.
In forecasting, omitted variable bias matters most when (1) you need to interpret coefficients, or (2) the omitted variable will change in the forecast period in a way that breaks the historical correlation.
Practice Questions
Question 1 of 4

Key Terms